Author:

 

Jesse Clous

jclous@gmail.com

 

American River College, Fall 2009

Goeg. 350 Introduction to Data Acquisition

 

 

 

 

 

Barry Bonds: A GIS Analysis to Homerun Production

 

 

 

---------------------------------------------------------------------------------

Abstract

The major focus of the project was to take attributes leading to Barry Bonds homerun production and display them graphically. This project geo-references all of Barry Bonds homerun events for cross examination with the average temperature of homerun event locations.  To perform attribute queries about homerun production and spatial quires with climate data, a GIS model was created.  The GIS model performs complex queries about Barry Bonds homerun attribute information and spatial relationships to average temperature at homerun locations.  Barry likes to hit most of his homeruns in San Francisco in the month of August, he prefers to hit homeruns in temperatures around 50-55 degrees.

---------------------------------------------------------------------------------

Introduction

 

In this report, GIS will be utilized to analyze and calculate attributes that relate to the total homerun production of Berry Bonds.  Home run production will also be cross examined with average climate of homerun locations. Berry Bonds is the lifetime homerun leader for Major League Baseball with 762 lifetime career homeruns (cbssports).

 

Goals:

  • Create a GIS capable of performing dynamic attribute queries relating to Barry Bonds homerun production
  • Create maps that summarize Barry Bonds homerun production attributes

 

Objectives:

  • Analyze and describe homerun production by stadium
  • Analyze and describe homerun production by pitcher hand
  • Analyze and describe homerun production by month
  • Analyze and describe homerun production by average temperature
  • Find interesting statistical information resulting from attribute and spatial querying

This project will study attributes leading to Barry Bonds all time homerun record and will answer the following questions:

 

Where did Barry Bonds hit most of his homeruns?

Where did he hit more homeruns off of left handed pitchers vs. right?

What month did he hit the most homeruns and where?

What climate does he hit the most homeruns?

 

To answer these questions a complete homerun log of Barry Bonds will be needed along with field locations and average climate information.  Once these are imputed into GIS analysis can be performed to answer the objective questions.  To answer the objective questions analytical and spatial analysis must be performed though the use of GIS.  Questions such as: what pitcher did Barry Bonds hit the most home runs off of, what day did Berry Bonds hit homerun 500 and where, and how many homeruns did Barry Bonds hit off of the Angels in 1999 off of left handed pitching in June can be answered by attribute queries. 

 

A GIS model will be created to perform attribute quarrying, spatial quarrying, quantitative analysis, and then used to map spatial quarry results.  Attribute tables and new layers will be created to analyze performed quarries.  Functions such as summarize and statistics will be used to calculate nominal homerun data. 

 

Depending on the availability of the data and the scope of analysis to be performed this project will take approximately 40 hours to be spread out through the academic semester.  This is a student project for Geog 334, Introduction of Software Applications, and is intended to be presented to the class to display the use of introductory GIS application functions involved in the project.  Results of the project is also intended to be viewed by anyone interested in Barry Bonds and wants to know statistical information about his life time homerun record..  Results of the project will be presented in an in class PowerPoint oral report, including maps and analytical results about homerun production.

---------------------------------------------------------------------------------

Background

 

Professional baseball today is increasingly using GIS applications to perform functions in baseball that were currently unavailable prior to GIS.  The Oakland Athletics created a raster model to measure the impact opposing players and teams have on them and analyze their players, as well as prospective player’s impact to their organization (Lewis). Construction of the New Pacific Northwest Park, a 455 million dollar stadium, home of the Seattle Mariners, uses GIS to design and manage construction for the new baseball filed.  Spoets Illustrated uses GIS to show attributes about individual athletes and graphically show relationships about them. Major league teams use GIS to calculate fan base and marketing advertisement efforts.

 

 

Oakland Athletics and Money Ball

 

The Oakland Athletics are always in the hunt for playoffs even though they have one of the smallest salaries in baseball, why?  The A’s have incorporated GIS in their franchise to manage layers and look for new talents.  Michael Lewis, author of Money Ball details how the Athletics use technology to their advantage.  They use GIS to calculate current and prospective players by spatially rating players plays through raster graphics of a baseball field so they acquire talent not by price and statistics but what attributes they are lacking or have excess of and make personal decision based on that.  A good overview of the GIS application by the Oakland A’s is available at http://rose.geog.mcgill.ca/wordpress/?p=482

 

 

 

 

 

New Pacific Northwest Park (Safeco Field)

 

Prior to the development of Safeco field in Seattle Washington a GIS model was created to both design and manage the construction phasing of the ballpark.  Using a GIS model allowed the design to be created around the end user.  Attribute tables about seating view, physical appearance, and proximity to commodities were connected for each section of seating in a shape file.  The model also included construction materials and phasing of the project.  GIS allwed the developer to desing efficiently while keeping a tab on budget. To  view a summary of the GIS model click here http://www.integralgis.com/pdf/stadium.pdf

 

 

 

 

 

 

Sports Illustrated: Great American Sports Atlas

 

What started out as 50th anniversary campaign for Sports Illustrated (SI) became an exclusive 32 page article highlighting GPS resulting in the SI Sports Atlas.  SI wanted to run a weekly feature for its 50th year highlighting the best athlete from each state and describing where the player came from.  Then it grew into all players of all sports, which in turn became the basis for the GIS model and the beginning of the Sports Atlas.

 

This article is what appears to be the beginning of using complex GIS in sports.  It can be used by scouts to see what regions to target for particular kinds of athlete or by the general fans for entertainment.  Team owners are using GIS to view the fan bases and market the team.  After reading this article I am positive that there is a huge future with GIS in sports.

 

 

 

 

 

 

 

Develop a Suitable Model

 

A model was developed prior to data acquisition to define the methodology to develop the GIS.  Using model builder in ArcMap a detailed model including GIS functions displays the structure to develop the GIS model. 

 

 

---------------------------------------------------------------------------------

 

Data Collection

 

Data needed for the GIS model include a Barry Bonds homerun log with locations of homeruns, homerun number, team against, team for, pitcher, pitcher hand, date, and. X, Y coordinates of the baseball fields will be needed to create field location points.  Average climate information will also be needed to perform spatial analysis against homeruns.  

 

Barry Bonds homerun data was found at cbssports.com and was available in .txt format.  From the cbssports.com website the homerun log was cut and pasted directly nto a Microsoft Excel spreadsheet. Attributes of this table called New_Homerun_log include Homerun, Date, Pitcher, Team Against, Team for, and Field. The attribute headings provided a few major problems when the raw data was pasted into Excel: the date column was in dd/mm/yyyy format and needed to be separated for attribute querying and there were spaces in column headings.  Also, there was no x,y attributes for field locations. The date column was then cut and pasted into Microsoft Word and the find and replace function was performed to replace all back slashes with commas.  Next the newly created Word document was opened and saved in Notepad so data could be imported back into Excel in comma delimited format.  

 

The column headings were changed to remove all spaces and x,y attributes were added from the field location table discussed next. Once the date data was imported back into Excel a complete homerun log with locations of homeruns, homerun number, locations of homeruns, team against, team for, pitcher hand, date, and pitcher was created.

 

 

Homerun Log

 

Next stadium locations needed to be defined.  A .kml file was found at 252pair.com and provided an xml format of all current baseball stadiums in Major League Baseball (Pair.com). Google Earth was used to produce the .kml files and was available in .html format.  The data was cut and pasted into Notepad and then imported into Excel separating column headings at line breaks.  The spread sheet was cleaned up to read just the stadium name and x, y coordinates so that it could be imported into ArcMap giving the GIS all field locations (App. 4) The x,y data for each field location was added to the homerun log so that individual homeruns could be spatially referenced.

 

.klm File

 

 

X, Y Table

Field Locations

 

Average temperature data was found at the National Climatic Data Center website.  A complete shape file named TEMP0313.shp is available containing median national temperatures from 1995 to 2000.  The data was downloaded from the site and was ready to be imported into ArcMap.

TEMP0313

 

The states shape file named states.shp was derived from Environmental Systems Research Institute, Inc. (ESRI) and obtained via student folder. 

 

The projection used in the GIS model is Equidistant_Conic and all data sets will either be defined or redefined to fit that projection. 

 

Imported Shapefiles and Data

 

File Name

Type

Geographic Coordinate System

Projection

TEMP0313.shp

Climate

GCS_NAD_1927_CG077

Unprojected

States.shp

State

GCS_North_American_1983

North_American_Equidistant_Conic

X,y.xml

Ballparks

UTM Long/Lat

Unprojected

Homerunlog.xml

Homeruns

UTM Long/Lat

Unprojected

 

Once all the data was collected and refined the data was imported into ArcMap.  The x,y stadium location table was imported into ArcMap and the function display x,y coordinates was performed.  The data was exported as a .shp file and projected in North_American_Equidistant_Conic.  The same process was done for the homerunlog.table. The states.shp file was added to the GIS along with the TEMP0131.shp file.  The TEMP0313.shp file was projected to North_American_Equidistant_Conic matching the states.shp projection.  All imported data is projected to the North_American_Equidistant_Conic. Now layers can be created to perform spatial and analytical analysis. 

 

New Shapefiles and Data

 

File Name

Type

Geographic Coordinate System

Projection

Climate.shp

Climate

GCS_NAD_1927_CG077

North_American_Equidistant_Conic

States.shp

State

GCS_North_American_1983

North_American_Equidistant_Conic

Ballparks.shp

Ballparks

UTM Long/Lat

North_American_Equidistant_Conic

All_Homeruns.shp

Homeruns

UTM Long/Lat

North_American_Equidistant_Conic

 

The climate.shp layer contained one shape file containing nine different attribute temperature values in the attribute table in the DEG_F column. 

 

FID

SHAPE

ID

GRIDCODE

DEG_F

 

There are nine different temperature values in the climate attribute table. To perform inside spatial quarries with homerun locations, the nine different temperature values were selected individually "DEG_F" = 'B 32.0 - 40.0' and nine new layers were created; one for each temperature zone.  Next the spatial quarry was performed by selecting all homeruns that are with in each of the newly crated temperature layers and 9 new layers were created. 

 

New Climate Layers

>70.0, 65.1–70.0, 60.1-65.0, 55.1-60.0, 50.1-55.0, 45.1-50.0, 40.1-45.0, 32.0-40.0, <32.0

 

New Climate Layers

 

New Homerun Layers

4_HR, 5_HR, 6_HR, 7_HR, 8_HR, 9_HR

 

A homerun layer was not need for all of the temperature zones because not all temperature zones contained homeruns.

 

Statistics for the attribute ‘field’ in the all homerun table were performed to summarize homerun totals per stadium.  This information was added to the Ballparks.shp attribute table. 

 

An attribute quarry was performed on the all_homerun layer “hand” = ‘L’ to select homeruns off left handed pitchers and “hand” = ‘R’ for homeruns off of left handed pitchers.  A layer was created for each of the selected attributes: LH_HR and RH_HR. 

 

New Homerun Layers

LH_HR, RH_HR

 

---------------------------------------------------------------------------------

 

Analysis

 

Where did Barry Bonds hit most of his homeruns?

 

Berry bonds hit most of his most runs at the two locations where he played the most games: San Francisco Giants 300, and Pittsburg Pirates 96. 

Homeruns

 

Where did he hit more homeruns off of left handed pitchers vs. right?

 

As with the total homeruns Barry hit most left handed and right handed homeruns with the San Francisco Giants: 212 RH, 88RH.  Unlike total homeruns, Barry did not hit the second most homeruns in Pittsburg.  He hit the second most homeruns right handed 59, but hit the second most left handed home runs at Cincinnati Reds.  This indicates the shortage of left handed pitching in the American League when Barry played for the Pittsburg Pirates. 

 

Homeruns by Pitcher Hand

 

What month did he hit the most homeruns and where?

 

Barry hit the most home runs in the month of August with 148.  The most homeruns runs hit in August in San Francisco with 58.

 

What climate does he hit the most homeruns?

 

Barry hit 326 Homeruns in temperature zones 55.1-60 degrees. 

 

Homer Runs vs. Climate

 

Besides specific examples answering project objective questions the GIS model created is capable of performing complex and highly specific attribute queries.  Below is a specific question that can be answered through a combination of spatial and attribute queries.

 

How many homeruns did Barry hit in a climate range of 45.1-55.0 degrees against left handed pitchers in the month of August in 1999?

 

Select by Attribute for All_HR

"Year" = 1999 AND "Month" = 8 AND "Hand" = 'L'

 

Select by Location from the selected features in the All_HR layer that are within the features of the 45.1-50.0 layer.

 

On August, 20 1999 Barry Bonds hit homerun number 431 off left handed pitcher Rafael in Milwaukee. 

---------------------------------------------------------------------------------

Conclusion

 

The GIS model created in this report answers all objective questions and allows for specific attribute searches for Barry Bonds homerun production.  Although the model is successful, there is much room for improvement.  There was a challenge creating graphics with this model, most of the power in the GIS was based on attribute relationships, not spatial.  This does however provide the grounds for an improved GIS model to apply to baseball.  I would like to pursue this project in the future by applying raster graphics that detail individual plays.  If I am able to locate play logs and charts I should be able to make a baseball game come to life in GIS.  I would be able to show illustrated maps detailing spatial occurrence in baseball.  Stay tuned for more GIS in baseball!

---------------------------------------------------------------------------------

References

 

252.pair.com. Web. 11 Oct 2009. <http://www252.pair.com/comdog/google_earth/major_league_baseball_stadiums.kml>.

 

"Bonds Carrer Homerun Log." CBSSportsMLB. CBSSPORTS, Web. 10 Oct 2009. <http://www.cbssports.com/mlb/bondstracker/bondslog>.

 

Lewis, Michael. Moneyball: The Art of Winning an Unfair Game. New Your: Norton and Company, Inc., 2003. Print.

 

Moore, Patrick. "Building a Baseball Stadium Using GIS."integralgis.com. 05/14/2002. Integralgis Inc., Web. 7 Dec 2009. <http://www.integralgis.com/pdf/stadium.pdf>.

 

"ncdc.noaa.gov." 4/26/2006. National Climatic Data Center, Web. 10 Oct 2009. <http://cdo.ncdc.noaa.gov/climaps/TEMP0313.SHP>.